Video processing system providing overlay of selected geospatially-tagged metadata relating to a geolocation outside viewable area and related methods

ABSTRACT

A video processing system may include a display, at least one geospatial database, and a video processor. The video processor may cooperate with the display and the at least one geospatial database and be configured to display a georeferenced video feed on the display and defining a viewable area, and to overlay selected geospatially-tagged metadata onto the viewable area and relating to a geolocation outside the viewable area.

FIELD OF THE INVENTION

The present invention relates to the field of video processing, and,more particularly, to processing of geospatially referenced video andrelated methods.

BACKGROUND OF THE INVENTION

Enhancements in video imaging, data storage capabilities, and satellitelocation technology have lead to the relatively widespread use ofgeoreferenced video in numerous applications such as recognizance,surveillance, surveying, and the like. Generally speaking, georeferencedvideo includes video imagery frames encapsulated in a transport streamalong with geospatial metadata that correlates the pixel space of theimagery to geospatial coordinate values (e.g., latitude/longitudecoordinates).

Given the large amounts of georeferenced video data that can begenerated and stored with technology, it can become difficult tocommunicate the video and associated metadata to users in a relativelystraightforward and intuitive way. Various approaches are used tocommunicate video-related information to users. One approach is setforth in U.S. Pat. No. 7,559,017 to Datar et al., which discloses asystem for transferring annotations associated with a media file. Anannotation associated with a media file is indexed to a first instanceof that media file. By comparing features of the two instances, amapping is created between the first instance of the media file and asecond instance of the media file. The annotation can be indexed to thesecond instance using the mapping between the first and secondinstances. The annotation can be processed (displayed, stored, ormodified) based on the index to the second instance.

Another potential difficulty with georeferenced video is how to providedesired situational awareness. Various approaches for improvingsituational awareness have been developed. One example is set forth inU.S. Pat. No. 6,392,661, which discloses an apparatus for arranging andpresenting situational awareness information on a computer displayscreen using maps and/or other situational awareness information, sothat greater amounts of relevant information can be presented to a userwithin the confines of the viewable area on small computer screendisplays. The map display layout for a screen display utilizes multiple,independent map displays arranged on a computer screen to maximizesituational awareness information and display that informationefficiently. The apparatus provides the ability to independently scalewith respect to distance, time and velocity, as well as zoom and paneach map on the screen display.

Another problem which may be encountered with sensor data providinggeoreferenced video is that position accuracy may vary from one sensortype to the next. One approach for addressing inaccurate geospatialimages through the use of image registration, i.e., where newly capturedimages are compared with reference images with known accuratecoordinates to provide a basis for correcting the newly captured imagegeospatial metadata. In accordance with one exemplary approach, U.S.Pat. No. 6,957,818 to Kumar et al. discloses a system for accuratelymapping between image coordinates and geo-coordinates, calledgeo-spatial registration. The system utilizes the imagery and terraininformation contained in a geo-spatial database to align geodeticallycalibrated reference imagery with an input image, e.g., dynamicallygenerated video images, and thus achieve a high accuracy identificationof locations within the scene. When a sensor, such as a video camera,images a scene contained in the geo-spatial database, the system recallsa reference image pertaining to the imaged scene. This reference imageis aligned with the sensor's images using a parametric transformation.Thereafter, other information that is associated with the referenceimage can be overlaid upon or otherwise associated with the sensorimagery.

Tracking objects within georeferenced video feeds is also a desirablefeature that may be problematic in some circumstances. One particularlyadvantageous system in this regard is the Full-Motion Video AssetManagement Engine (FAME™) from the present Assignee Harris Corporation.The FAME™ system speeds the process of analyzing a wide range ofintelligence information. For geospatial analysis, the FAME™ system hasa mapping interface that provides a visual display for the sensor trackand location of frames of video from an unmanned aerial vehicle (UAV) orother source. This tool allows indexing, search, retrieval, and sensortracking in real time during play out. Further exploitation ofgeospatial metadata is done by extracting embedded Key-Length-Value(KLV) metadata from the video stream.

Despite the advantages of such approaches, further functionality may bedesirable for processing and displaying georeferenced video feeds.

SUMMARY OF THE INVENTION

In view of the foregoing background, it is therefore an object of thepresent invention to provide a system and related methods for enhancedprocessing of georeferenced video.

This and other objects, features, and advantages are provided by a videoprocessing system which includes a display, at least one geospatialdatabase, and a video processor. More particularly, the video processorcooperates with the display and the at least one geospatial database andbe configured to display a georeferenced video feed on the display anddefining a viewable area, and overlay selected geospatially-taggedmetadata onto the viewable area and relating to a geolocation outsidethe viewable area. As such, the system advantageously providesgeospatial metadata for objects that are outside of a viewable area as ahelpful reference for guidance, tracking, and/or orientation, forexample.

By way of example, the selected geospatially-tagged metadata maycomprise at least one of geospatially referenced feature annotations,geospatially referenced video source locations, and geospatiallyreferenced points of interest. The video processor may include anoverlay generator configured to overlay the selected geospatially-taggedmetadata onto the viewable area. Moreover, the overlay generator maygenerate at least one indicator for the selected geospatially-taggedmetadata. The at least one indicator may comprise at least one of arange indicator and a bearing indicator, for example.

The video processor may also include a request handler configured toaccept a query of the at least one geospatial database to generate theselected geospatially-tagged metadata. The query may be based upon atleast one filtering parameter. By way of example, the at least onefiltering parameter may comprise at least one of a subject categoryfiltering parameter, and a distance filtering parameter.

Further, the at least one geospatial database may comprise a firstgeospatial database of fixed geospatially-tagged metadata, and a secondgeospatial database of variable geospatially-tagged metadata. The videofeed may be a live video feed, for example.

A related video processing method may include displaying a georeferencedvideo feed on a display and defining a viewable area. The method mayfurther include overlaying selected geospatially-tagged metadata ontothe viewable area and relating to a geolocation outside the viewablearea.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of a video processing system inaccordance with one aspect of the invention providing overlayedgeospatially-tagged metadata onto a viewable display area and relatingto a geolocation outside of the viewable area.

FIG. 2 is a schematic block diagram of an alternative embodiment of thevideo processing system of FIG. 1.

FIG. 3 is view of the display of the system of FIG. 1 showing an aerialimage with overlayed geospatially-tagged metadata.

FIGS. 4 and 5 are flow diagrams illustrating video processing methodaspects associated with the systems of FIGS. 1 and 2.

FIG. 6 is a schematic block diagram of a video processing system inaccordance with another aspect of the invention providing geospatialcorrelation of annotations to an object in overlapping geospatial videofeeds.

FIG. 7 is schematic block diagram of an alternative embodiment of thevideo processing system of FIG. 6.

FIGS. 8A and 8B are respective frame views of overlapping geospatialvideo feeds taken from different vantage points and showing correlationof an annotation to an object from the first feed to the second feed.

FIGS. 9 and 10 are flow diagrams illustrating video processing methodaspects associated with the systems of FIGS. 6 and 7.

FIG. 11 is a schematic block diagram of a video processing system inaccordance with still another aspect of the invention providing asuccessively expanding search area for moving objects when outside of aviewable area to allow for ready searching for the moving object when itis within the search area.

FIGS. 12 and 13 are flow diagrams illustrating method aspects associatedwith the system of FIG. 11.

FIGS. 14A-14D are a series of display views illustrating the use of asuccessively expanding search area by the video processing system ofFIG. 11.

FIG. 15 is a schematic block diagram of a video processing system inaccordance with yet another aspect of the invention providing correctionof geospatial metadata among a plurality of georeferenced video feeds.

FIG. 16 is a schematic block diagram of an alternative embodiment of thevideo processing system of FIG. 15.

FIGS. 17 and 18 are flow diagrams illustrating method aspects associatedwith the system of FIGS. 15 and 16.

FIG. 19 is a view of the display of the system of FIG. 15 illustratinggeospatial metadata correction operations performed by the systems ofFIGS. 15 and 16.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention will now be described more fully hereinafter withreference to the accompanying drawings, in which preferred embodimentsof the invention are shown. This invention may, however, be embodied inmany different forms and should not be construed as limited to theembodiments set forth herein. Rather, these embodiments are provided sothat this disclosure will be thorough and complete, and will fullyconvey the scope of the invention to those skilled in the art. Likenumbers refer to like elements throughout, and prime notation is used toindicate similar elements in alternate embodiments.

Referring initially to FIGS. 1 and 4, a video processing system 30 andassociated method aspects are first described. The system 30illustratively includes a display 31, one or more geospatial databases32, and a video processor 33. By way of example, the video processorsdescribed herein and the various functions that they perform, may beimplemented using a combination of computer hardware (e.g.,microprocessor(s), memory, etc.) and software modules includingcomputer-executable instructions, as will be appreciated by thoseskilled in the art. Similarly, the geospatial database(s) 32 may beimplemented using a suitable storage device(s) (i.e., memory) and adatabase server application to be run on a computing platform andincluding computer-executable instructions for performing the variousdata storage and retrieval operations described herein, a will also beappreciated by those skilled in the art.

As noted above, situational awareness in video can be difficult toachieve using prior art approaches. With the ever-larger amounts ofgeoreferenced video data being generated, intuitive approaches forcommunicating geospatial metadata information associated with thegeoreferenced videos to the viewer are desirable. Otherwise, rapidanalysis of geospatial information, which may be required in certainapplications, may prove difficult.

While existing satellite positioning technology (e.g., GPS units) allowfor some degree of situational awareness, this is typically not the casein a video environment. In the above-noted FAME™ system, video andmetadata from multiple sources may be viewed by many different people,and situational awareness is accomplished through the referencing ofexternal applications or area maps, such as Google™ Earth, for example.While annotations may be added to video by users, those annotationstypically cannot be referenced from other videos or visualization tools.

The system 30 advantageously provides a unified approach to managegeospatially tagged metadata, user-defined features, and points ofinterest, which may be implemented in a video platform such as the FAME™system, for example, although the present invention may be used withother suitable systems as well. That is, the system 30 mayadvantageously be used for a video-centric environment to apply reversegeocoding techniques to increase real-time situational awareness invideo.

In particular, the video processor 33 cooperates with the display 31 andthe database 32 and is configured to display a georeferenced video feedon the display and defining a viewable area, at Blocks 40-41. Referringadditionally to the example in FIG. 3, and to FIG. 2, an aerial view ofa vehicle 34 being tracked by a video sensor as it travels along a road35 is shown, which defines a viewable area (i.e., what can be seen onthe display 31). An object highlight box 36 and accompanying annotation(“Ground: Vehicle”) indicates to the viewer or operator that the objectis being tracked, and what the object is, although such indicators arenot required in all embodiments. To provide enhanced situationalawareness, the video processor 33 advantageously overlays selectedgeospatially-tagged metadata onto the viewable area and relating to ageolocation outside the viewable area, at Block 42, thus concluding theillustrated method (Block 43).

In the illustrated example, two such indicators are included, namely abearing indicator (i.e., arrow) 37 indicating a bearing to the selectedgeospatial location outside the viewable area, as well as a rangeindicator 38 indicating a range thereto (“Airstrip: 500 m”). By way ofexample, the distance may be measured between a current frame center (asdetermined from sensor metadata) and the desired feature locationobtained from the internal geospatial database 51′. The bearing anglemay be measured between true north and line connecting the current framecenter and the selected feature, for example.

By way of example, the selected geospatially-tagged metadata maycomprise at least one of geospatially referenced feature annotations,geospatially referenced video source locations, and geospatiallyreferenced points of interest. Turning additionally to FIG. 5,annotations/metadata may be provided by an external geospatial database50′, such as a commercial off-the-shelf (COTS) database includinggeospatial points of interest, etc., as well as through user input(i.e., user-generated geospatial data). The external and user-generatedgeospatial data may be stored in a common internal geospatial database51′ for convenience of access by the video processor 33′ in theillustrated example, but the geospatial metadata may be distributedamong multiple databases or sources in other embodiments. The database51′ stores the respective geospatial features along with their absolutelatitude and longitude, as will be appreciated by those skilled in theart.

The external geospatial database 50′ may be conceptually viewed as afixed or static set of geospatial data, even though such commerciallyavailable data sets may be customized or modified in some embodiments,and the user-generated geospatial data may be considered as variabledata that may be readily changed by users. That is, the system 30′advantageously provides for reverse geocoding with both static anduser-defined geospatial features on a video-centric platform.

In particular, the video processor 33′ illustratively includes a requesthandler 52′ configured to accept a query or request from a user. Thequery is communicated to the geospatial database 51′ to generateselected geospatially-tagged metadata that satisfies the given query(i.e., filtering) parameters, at Blocks 44′-45′. The query may be basedupon one or more filtering parameters, which in the present exampleincludes a category filtering parameter (i.e., airports), and a distancefiltering parameter (i.e., within 50 km). By way of example, thecategory filtering parameters may include categories such as buildings,landmarks (e.g., airports or airfields, etc.), natural formations (e.g.,rivers, mountains, etc.), vehicles, etc.

The video processor 33′ further illustratively includes a marker handler53′ which is configured to overlay the selected geospatially-taggedmetadata obtained from the database 51′, if any (Block 46′), onto theviewable area. The video processor 33′ also illustratively includes anoverlay generator 54′ for overlaying the appropriate annotation(s) onthe video feed displayed on the display 31′ as described above. Thisadvantageously allows the video to be viewed by the user as normal,while at the same time providing ready access to information foroff-screen or out of view features of interest, including names,locations, and any other relevant information stored in the database51′. Other information, such as population, size, speed, priority level,etc., may also be included in the database 51′ and handled by the markerhandler 53′. Location information for mobile or moving off-screenobjects may be provided by a secondary tracking system, e.g., asecondary user viewing station with a separate video interfaced to thesystem 31′, as will be appreciated by those skilled in the art.

Exemplary applications for the systems 30, 30′ may include applicationssuch as surveillance, planning, or reconnaissance where it is desirableto remain aware of objects or features which are out of frame. Moreover,the systems 30, 30′ may also advantageously be used for location-basedservices and advertising, as will be appreciated by those skilled in theart.

Turning additionally to FIGS. 6-10, another video processing system 60and associated method aspects are now described. In certain applicationswhere multiple geospatial video feeds are used at the same time, it maybe desirable to correlate geospatial metadata across the various videodata sets. That is, as imagery/video detectors or sensors become moreubiquitous, there is a need to translate geospatial metadata between thedifferent sensor feeds. By way of example, such applications may includegovernment, emergency services, and broadcast industry applications. Inthe government sector, annotation standards and tools to accommodatethose standards are in current development. In the commercial sector,telestration tools are in use, but are typically limited to visualannotation of still frames.

Generally speaking, the system 60 advantageously allows for transferringvisual annotations between disparate sensors. Extracted metadata may beutilized to spatially correlate sensor perspectives. In real-time,annotations may be projected onto an alternative georeferenced videofeed, whether temporal or non-temporal in nature. Moreover, annotationsmay be transferred onto temporal data which overlaps spatially within auser-defined offset, and annotations may also be transferred ontospatially overlapping non-temporal data.

More particularly, the video processing system 60 illustrativelyincludes a first video input 61 configured to receive a firstgeoreferenced video feed from a first video source, and a second videoinput 62 configured to receive a second georeferenced video feed from asecond video source, which overlaps the first video georeferenced videofeed, at Block 90-91, as will be discussed further below. The system 30further illustratively includes a video processor 63 coupled to thefirst and second video inputs 61, 62. The video processor 63 furtherillustratively includes an annotation module 64 configured to generatean annotation for an object in the first georeferenced video feed, atBlock 92. The video processor 63 also illustratively includes ageospatial correlation module 64 configured to geospatially correlatethe annotation to the object in the second georeferenced video feedoverlapping the first georeferenced video feed, at Block 93, thusconcluding the method illustrated in FIG. 9 (Block 94). Accordingly, thevideo processing system 60 advantageously allows annotations made in oneperspective to be translated to other perspectives, and thus providestracking abilities and correlation of objects between differentgeoreferenced video feeds.

More particularly, one example in which the first and secondgeoreferenced video feeds overlap each another will be furtherunderstood with reference to FIGS. 7, 8A, and 8B. In the example of FIG.7, the first video source includes a first video camera 70′ and a firstmetadata generation module 72′, while the second video source includes asecond video camera 71′ and a second metadata generation module 73′.That is, the cameras 70′, 71′ generate video imagery based upon theirparticular vantage point, while the metadata generation modules 72′, 73′generate respective metadata for each of the video feeds, at Block 95′.In some embodiments, the metadata generation modules 72′, 73′ may beincorporated within the cameras 70′, 71′. In other embodiments, thecameras 70′, 71′ may be stationary cameras, etc., without the capabilityto produce metadata (e.g., traffic cameras), and the metadata generationmodules may be inserted downstream to create the necessary metadata andpackage it together with the video imagery in a media transport format,as will be appreciated by those skilled in the art.

In the illustrated example, the video cameras 70′, 71′ are directed at acommon scene, namely a football player 80 which in FIG. 8A is divingtoward the first video camera 70′ to make a catch. The video frame ofFIG. 8B shows the same player making the catch from a side view a fewmoments later in time. In the illustrated example, an annotation 81 isinput from a telestrator via the annotation module 64′ by a user (e.g.,a commentator). The annotation reads “keeps his eye on the ball,” and itis overlayed on the video feeds as shown. Of course, other suitableinput devices for providing annotation input (e.g., computermouse/keyboard, etc.) may also be used. Various types of visual textual,etc., annotations may be used, as will be appreciated by those skilledin the art.

The video processor 63′ illustratively includes a geospatial metadataextraction module 74′ for extracting geospatial metadata from the firstand second georeferenced video feeds (Block 91′), which may also bestored in a metadata database 75′ (e.g., a COTS database). An archivalstorage device or database 77′ may also be included and configured tostore the first and second georeferenced video feeds. The archivestorage database 77′ may also be implemented with a COTS database orother data storage medium.

The geospatial correlation module 65′ illustratively includes acoordinate transformation module 76′ configured to transform geospatialcoordinates for the annotation in the first georeferenced video feed topixel coordinates in the second georeferenced video feed. Moreover, thefirst and second video sources may have respective first and secondsource models associated therewith, and the transformation module mayperform affine transformations using the first and second source models.

More particularly, the affine transformations between image and groundspace (and vice versa) are performed using sensor models that are uniqueto each sensor (here the video cameras 70′, 71′), according to thefollowing equation:

$\begin{bmatrix}d_{x} \\d_{y} \\d_{z}\end{bmatrix} = {{\begin{bmatrix}1 & 0 & 0 \\0 & {\cos - \theta_{x}} & {\sin - \theta_{x}} \\0 & {{- \sin} - \theta_{x}} & {\cos - \theta_{x}}\end{bmatrix}\begin{bmatrix}{\cos - \theta_{y}} & 0 & {{- \sin} - \theta_{y}} \\0 & 1 & 0 \\{\sin - \theta_{y}} & 0 & {\cos - \theta_{y}}\end{bmatrix}}{\quad{\begin{bmatrix}{\cos - \theta_{z}} & {\sin - \theta_{z}} & 0 \\{{- \sin} - \theta_{z}} & {\cos - \theta_{z}} & 0 \\0 & 0 & 1\end{bmatrix}\left( {\begin{bmatrix}a_{x} \\a_{y} \\a_{z}\end{bmatrix} - \begin{bmatrix}c_{x} \\c_{y} \\c_{z}\end{bmatrix}} \right)}}}$

where a is the ground point, c is the location of the camera, and θ isthe rotation of the camera (compounded with platform rotation). As willbe appreciated by those skilled in the art, accuracy may be increased byusing an elevation surface, rather than a spheriodal/ellipsoidalreference surface, in some embodiments, if desired.

In instances where real-time processing is desired, or the spatialmetadata is not sufficient to construct a desired sensor model, othermethods (such as interpolation of corner points) may also be used (Block92′), but potentially with reduced accuracy, as will also be appreciatedby those skilled in the art. By way of example, the data below is anexcerpt from a standard set of Scan Eagle metadata that can be used tospecify the transformation to and from ground space:

<Slant_Range>1228.2802734375</Slant_Range><Sensor_Roll_Angle>6.440000057220459</Sensor_Roll_Angle><Sensor_Pointing_Azimuth>242.3699951171875</Sensor_Pointing_Azimuth><Sensor_Elevation_Angle>243.3899993896484</Sensor_Elevation_Angle><Aircraft_roll_angle>11.06999969482422</Aircraft_roll_angle><Aircraft_pitch_angle>−0.09000000357627869</Aircraft_pitch_angle><Aircraft_heading_angle>82</Aircraft_heading_angle><Field_of_View>8.010000228881836</Field_of_View><Sensor_Altitude>1830.019165039063</Sensor_Altitude><Sensor_Latitude>35.51555555555556</Sensor_Latitude><Sensor_Longitude>−117.2654444444444</Sensor_Longitude>

Furthermore, the geospatial correlation module 65′ may further include avelocity model module 78′ configured to generate velocity models of theobject (i.e., the football player 80 in FIGS. 8A and 8B) in the firstand second georeferenced video feeds for tracking the objecttherebetween, at Block 93′. Velocity models are computed to provideaccurate interpolation between sensors with differing collect intervals.Pixel-based tracking may be used to reacquire the object of interest andcontinue tracking between the various video feeds so that the annotationmay continue to follow the tracked object as the video progresses.Exemplary velocity models and tracking operations will be discussedfurther below. The system 60′ further illustratively includes one ormore displays 79′ coupled to the video processor 63′ for displaying oneboth of the video feeds with the overlayed annotation(s).

The systems 60, 60′ thus advantageously provide for “chaining” visualsensors to track annotated objects across wide areas for real-time orforensic purposes. Moreover, this may also reduce user workloadnecessary to mark up multiple sources, as well as improving usersituational awareness. This approach may also be used to automaticallyenhance metadata repositories (since metadata generated for one feed mayautomatically be translated over to other overlapping feeds), and it hasapplication across multiple media source types including video, motionimagery, and still imagery.

Turning additionally to FIGS. 11-14D, another exemplary video processingsystem 110 and associated method aspects are now described. By way ofbackground, maintaining situational awareness over a large area usinggeoreferenced video may be difficult. Current tracking technologies maybe of some help in certain applications, but such trackers are usuallylimited to tracking objects within a viewable frame area and which arenot occluded. Since video has become an important tool for decisionmaking in tactical, disaster recovery, and other situations, tools toenhance the effectiveness of video as an informative data source aredesirable. Generally speaking, the system 110 maintains an up-to-dategeospatial location for tracked objects and generates a movement modelfor that object. Using the movement model and the latest (i.e., lastknown) geospatial coordinates of the object, the object's location maybe predicted even if primary tracking is lost due to the object nolonger being within the viewable frame area, occlusion, etc. As such,the system 110 may be particularly advantageous for civil programs(e.g., police, search and rescue, etc.), and various video applicationssuch as telestration tools, collaborative video tools, etc., as will beappreciated by those skilled in the art.

More particularly, the video processing system 110 illustrativelyincludes a display 111 and a video processor 112 coupled to the display.Beginning at Block 120, the video processor 112 illustratively includesa display module 114 configured to display a georeferenced video feed onthe display defining a viewable area, at Block 121. The video processor112 further illustratively includes a geospatial tracking module 115configured to determine actual geospatial location data for a selectedmoving object 140 within the viewable area, at Block 122, which in theexample shown in FIG. 14A is a vehicle traveling along a road 141. Themodules 114, 115 may be implemented with an existing video processingplatform that performs pixel tracking, such as the above-noted FAME™system, for example. Moreover, a tracking indicator and/or annotationmay also be displayed along with the object being tracked, as discussedabove with reference to FIG. 3, for example (Block 121′).

The module 115 is further configured to generate estimated geospatiallocation data along a predicted path for the moving object 140 when themoving object is no longer within the viewable area and based upon theactual geospatial location data, at Blocks 123-124, and as seen in FIG.14B. The predicted path is visually represented by an arrow 144. Themoving object 140 (which is not shown in FIGS. 14B and 14C toillustrated that it is outside the viewable area) may cease to be withinthe viewable area for a variety of reasons. As noted above, the objectmay go behind or underneath a structure (e.g., building, tunnel, etc.),which occludes the moving object 140 from view by the sensor. Anotherreason is that the object moves outside of the viewable area. Stillanother reason is that the zoom ratio of the image sensor capturing thevideo feed may be changed so that the moving object 140 is no longerwithin the viewable area, which is the case illustrated in FIGS.14A-14D.

More particularly, in FIG. 14B the viewable area that can be seen on thedisplay 111 is zoomed in to focus on a house 142. In other words, themoving object 140 and road 141 would no longer be viewable to the useron the display 111, although the imagery included within the originalviewable area 143 from FIG. 14A is still shown in FIG. 14B forreference. When the moving object 140 is no longer within the viewablewindow, the video processor 112 defines a successively expanding searcharea 145 for the moving object 140 based upon the estimated geospatiallocation data (i.e., the last known geospatial position of the objectbefore it was lost from the viewable area), at Block 125. The viewablearea is defined by a first set of boundary pixels in pixel space (e.g.,the corner points of the viewable area), and the video processor 112 maybe further configured to define the successively expanding search areaby a second set of boundary pixels (e.g., a second set of corner pixelpoints), for example, in the case of a rectangular boundary area. Otherboundary area shapes may also be used, if desired (e.g., circular,etc.).

As can be seen in the sequence of FIGS. 14B through 14D, the search area145 continues to expand while the moving object 140 is outside of theviewable area. This is because the longer the object 140 is outside theviewable area, the more the confidence level as to the object'sestimated position 146 will decrease. That is, the velocity of theobject 140 may change from the last known velocity, which would resultin the object being nearer or father away from an estimated location 146based solely on its last known velocity. Thus, the search area 145 mayadvantageously expand to accommodate a range of increasing anddecreasing velocities as time progresses.

To improve accuracy, in some instances knowledge of the moving object'slast position may be used in refining the velocity model. For example,if the object 140 was at an intersection and had just begun moving whenit was lost from the viewable area, knowledge of the speed limit wouldallow the video processor 112 to refine the velocity model to accountfor acceleration up to the speed limit, and use the speed limit as theestimated rate of travel from that point forward. Another way in whichthe expandable search area 145 could be adjusted to account for theparticular area where the object is would be if the projected path ofthe object takes it to an intersection, where the object couldpotentially change its direction of travel. In such case, the rate ofexpansion of the search area 145 could be increased to account for thepotential change in direction, as well as continued travel along thepredicted path. Other similar refinements to the rate of expansion ofthe search area 145 may also be used, as will be appreciated by thoseskilled in the art.

In the present example, the moving object 140 is once again within theviewable area in FIG. 14D. Here, this occurs because the operator haszoomed out to the original viewable area shown in FIG. 14A. With typicalpixel tracking systems, it would be difficult to automaticallyre-acquire the moving object 140 without user intervention because thepixel tracker would not know where to begin looking for the object.However, by generating and expanding the search area 145 over the timethat the moving object is outside of the viewable area, the videoprocessor 112 now has a relatively well-defined area in which to search(e.g., though pixel tracking operations) to find the moving object 140when it is once again within the viewable area, at Blocks 126-127, thusconcluding the method illustrated in FIG. 12 (Block 128).

This significantly increases the probability that the object 140 can belocated and tracking resumed. Thus, the video processor 112 mayrelatively quickly re-acquire the moving object 140 after it exits andre-enters the viewable area, after panning away from and back to theobject, etc., to thereby provide enhanced tracking and/or monitoring ofobjects within georeferenced video feeds. Yet, even if the moving object140 is not recovered once it is again within the viewable area, it'slast know location and predicted path are potentially important piecesof information. The system 112 may optionally include one or moregeospatial databases 113, which provides the ability to maintain orstore known locations of important objects. This may advantageouslyallow tracking of targets to be resumed by other UAVs or video sensors,even though the object can no longer be tracked by the current sensor.

One exemplary velocity modeling approach is now described. The movingobject 140 location in pixel space may be converted to geospatialcoordinates, from which the velocity model is generated. The velocitymodel may take a variety of forms. One straightforward approach is tocalculate the velocity of the object 140 as a ratio of distance traveledto time between measurements as follows:

${{v_{obj}(t)} = \frac{\Delta_{pos}}{\Delta \; t}},$

where Δ_(pos) is the change in position, and t is time. An average maythen be used to estimate future velocity as follows:

${{v_{obj}\left( {t + 1} \right)} = \frac{\sum\limits_{t = {- n}}^{0}\left( {v_{obj}(t)} \right)}{n}},$

where n is the number of measurements over which the velocity isaveraged. More sophisticated alternatives of the velocity model mayaccount for elevation, earth curvature, etc., (Block 124′) to furtherimprove accuracy where desired. Accounting for earth curvature orelevation may be particularly helpful when tracking objects overrelatively long distances/measurement intervals, for example.

At a certain point, it may become appropriate for the video processor112 to discontinue generating the estimated geospatial location data.For example, if the expandable search area 145 exceeds a threshold, suchas a size threshold, or a threshold time for position estimation, atBlock 129′, then the search area may have expanded to the point that itis no longer beneficial for re-acquiring the object 140 for tracking.That is, the search area may have become so large that there is nopractical benefit to continuing expansion of the search area 145, andthe processing/memory overhead requirements associated therewith. Thelength or size of such thresholds may vary based upon the particularimplementation, and could be changed from one implementation orapplication to the next. Factors that may affect the duration or size ofthe threshold include the nature of the objects being tracked, theirability to change directions (e.g., complexity of road system), expectedvelocities of the objects in a given environment, etc., as will beappreciated by those skilled in the art. For example, it may bedesirable to track a vehicle traveling along a long, straight dirt roadwhere the top speed may be relatively slow, as opposed to a vehicle in ametropolitan area where there is ready access to high-speed interstatesthat go in many different directions.

Referring additionally to FIGS. 15 through 19, another exemplary videoprocessing system 150 and associated method aspects are now described.In applications such as aerial surveillance, targeting, and researchsystems, for example, geospatial metadata sent from UAVs or other aerialplatforms is often not precise enough for position sensitive activitiesor determinations. Generally speaking, the system 150 advantageouslyprovides an approach for automatically correcting inaccuracies ingeospatial metadata across multiple video feeds due to misaligned framesin the video or through a lack of coordinate precision, for example.

More particularly, the video processing system 150 illustrativelyincludes one or more video ingest modules 151 for receiving a pluralityof georeferenced video feeds each comprising a sequence of video framesand initial geospatial metadata associated therewith. Moreover, eachgeoreferenced video feed has a respective different geospatial accuracylevel associated therewith. In the illustrated example, there are twogeoreferenced video feeds, but other numbers of feeds may be used insome embodiments as well.

The system 150 further illustratively includes a video processor 152coupled to the video ingest module 151 that is configured to performimage registration among the plurality of georeferenced video feeds, atBlock 171. Moreover, the video processor 152 further generates correctgeospatial metadata for at least one of the georeferenced video feedsbased upon the initial geospatial metadata, the image registration andthe different geospatial accuracy levels, at Block 172, thus concludingthe method illustrated in FIG. 17.

The system 150 may thereby provide automatic real-time metadatacorrection that may use geospatial metadata to find a general area ofreference between two or more sensor feeds (UAVs, stationary camera,etc.), and use a predefined accuracy metric to determine which feed ismore accurate. For example, some sensor feeds that produce full motionvideo (30 fps) are less accurate than high definition surveillance feeds(<15 fps) that are captured at a higher altitude. The video processor152 may perform image registration not only against reference images,which may be stored in a geospatial image database 153′, but also mayperform image registration between the overlapping portions of differentvideo frames.

More particularly, as the video feeds are being ingested, theirrespective geospatial metadata is used by the video processor 152 tofind a common region of interest 191 between the feeds, typicallycorresponding to a landmark. In some applications, the referencegeospatial images in the database 153′ may be used as well. The videoimage frames (and, optionally, images from the database 153′) are usedto perform the image registration around the common region of interest191.

In the example of FIG. 19, four different aerial sensors are used togenerate a georeferenced video feed for the area surrounding a building190, and the particular area of interest 197 corresponds to a specificlandmark on the building, namely a dome 191. The coordinates for thedome 191 resulting from the first video feed result in a point 192(represented with a star) within a few meters of the dome, which istherefore the most accurate of the video feeds. Points 193 and 194 arefrom the second and third video sensors, respectively, and are fartheraway from the dome 191. The fourth sensor video feed is the leastaccurate, and provides a geospatial coordinate set for the dome 191 thatis approximately two hundred meters away and in the middle of acompletely different building 196 due to a floating point imprecisionassociated with the fourth sensor.

Accuracy metrics for the various sensor types are typically known or maybe measured prior to video capture, as will be appreciated by thoseskilled in the art. Once the image registration has been performed, withthe benefit of the accuracy metrics the video processor 152 mayautomatically correct the geospatial metadata for video frames in one ormore of the video feeds using a metadata correction algorithm. Dependingupon the given implementation, the correction algorithm may berelatively straightforward, or more complex, depending upon the desiredspeed and accuracy required. By way of example, for real-timeapplications, faster and slightly less accurate algorithms may be used.One straightforward approach is to correct the metadata for the lessaccurate sensor with that of the most accurate sensor (i.e., based upontheir respective accuracy metrics). Thus, using this straightforwardalgorithm, the video processor 152 would determine which video feed fromthe provided video feeds is from the sensor with the greatest accuracy,and it would perform the correction based upon the metadata therefrom.

A somewhat more sophisticated approach is to use the predefined accuracyratings to rank each sensor feed. This approach uses a weighted averageof the metadata from all of the feeds to determine the new or correctedgeospatial metadata based on the their respective accuracy rankings, atBlock 172′. One exemplary algorithm for performing the weighted averageis as follows:

$G = \left( \frac{\sum\limits_{i = 1}^{N}{\left( \frac{R_{i}}{T} \right)_{i}O_{i}}}{\sum\limits_{i = 1}^{N}\left( \frac{R_{i}}{T} \right)_{i}} \right)$

where G=new corrected geospatial metadata, N=number of sensors, whereR=sensor ranking, T=total rankings, and O=old geospatial metadata.

The video processing system 150′ also illustratively includes geospatialmetadata database 154′ coupled to the video processor 152′ for storingthe corrected geospatial metadata. A geospatial video database orstorage device 155′ is coupled to the video processor 152′ and is forstoring the sequence of video images for each video feed. In someembodiments, some or all of the data may be combined into a commondatabase, for example.

The system 150′ further illustratively includes a display 156′ coupledto the video processor 152′, which is configured to display the sequenceof video frames of one or more of the georeferenced video feeds on thedisplay and with the corrected geospatial metadata associated therewith,at Block 177′. Thus, for example, when the video feed for the fourthsensor noted above is displayed, rather than providing a geospatiallocation that is approximately two hundred meters off when the userselects the dome 191, the user would instead be provided with thecorrected geospatial coordinates.

Again depending upon the speed and accuracy level required, the videoprocessor 152′ may perform the correction operations on an intervalbasis, rather than on every frame. That is, the video processor 152′ maygenerate the corrected geospatial metadata every N number of videoframes, where N is greater than 1. In addition to correcting inaccurategeospatial data for a given video feed, in some instances the video feedmay have missing geospatial metadata due to errors, etc. In such case,the video processor 152′ may be further configured to fill in themissing geospatial metadata using the same approach outlined above,i.e., based upon the initial geospatial metadata, the image registrationand the different geospatial accuracy levels, at Blocks 175′-176′.

The above-described approach may advantageously be implemented on aplatform independent basis. As such, with little or no operatorintervention, the geospatial information in the video frames may beautomatically corrected to produce a more accurate georeferenced videothan relying on raw sensor video alone. Moreover, the system 150 alsoadvantageously provides ingest and metadata correction abilities for newvideo streams where reference imagery is not otherwise available, butother, more accurate aerial sensor video feeds are. Further, thecorrected metadata and video feed may be respectively stored in thegeospatial metadata database 154′ and geospatial video database 155′ toprovide the video analyst with accurate georeferenced video to performfuture metadata correction (i.e., from archived video feeds), as opposedto real-time or live video feeds.

The systems 150, 152′ therefor advantageously may save users time andmoney by automatically correcting frames in a video feed(s) video whichwould otherwise have inaccurate geospatial information. These systemsmay advantageously be used in a variety of applications for governmentand civilian sectors where relatively accurate georeferenced videostreams are required, such as targeting systems, surveillance systems,and aerial mapping, for example.

The above-described systems may be implemented in various videoprocessing platforms, such as the above-described FAME™ system, forexample. It should also be noted that the some or all of the aspects ofthe systems and methods, which were described separately above forclarity of illustration, may also be combined in a single system ormethod, as will be readily appreciated by those skilled in the art.

Many modifications and other embodiments of the invention will come tothe mind of one skilled in the art having the benefit of the teachingspresented in the foregoing descriptions and the associated drawings.Therefore, it is understood that the invention is not to be limited tothe specific embodiments disclosed, and that modifications andembodiments are intended to be included within the scope of the appendedclaims.

1. A video processing system comprising: a display; at least onegeospatial database; and a video processor cooperating with said displayand said at least one geospatial database and configured to display ageoreferenced video feed on said display and defining a viewable area,and overlay selected geospatially-tagged metadata onto the viewable areaand relating to a geolocation outside the viewable area.
 2. The videoprocessing system of claim 1 wherein the selected geospatially-taggedmetadata comprises at least one of geospatially referenced featureannotations, geospatially referenced video source locations, andgeospatially referenced points of interest.
 3. The video processingsystem of claim 1 wherein said video processor comprises an overlaygenerator configured to overlay the selected geospatially-taggedmetadata onto the viewable area.
 4. The video processing system of claim3 wherein said overlay generator generates at least one indicator forthe selected geospatially-tagged metadata.
 5. The video processingsystem of claim 4 wherein the at least one indicator comprises at leastone of a range indicator and a bearing indicator.
 6. The videoprocessing system of claim 1 wherein said video processor comprises arequest handler configured to accept a query of said at least onegeospatial database to generate the selected geospatially-taggedmetadata.
 7. The video processing system of claim 6 wherein the query isbased upon at least one filtering parameter.
 8. The video processingsystem of claim 7 wherein the at least one filtering parameter comprisesat least one of a subject category filtering parameter, and a distancefiltering parameter.
 9. The video processing system of claim 1 whereinsaid at least one geospatial database comprises: a first geospatialdatabase of fixed geospatially-tagged metadata; and a second geospatialdatabase of variable geospatially-tagged metadata.
 10. The videoprocessing system of claim 1 wherein the video feed comprises a livevideo feed.
 11. A video processing system comprising: a display; atleast one geospatial database; and a video processor cooperating withsaid display and said at least one geospatial database and configured todisplay a georeferenced video feed on said display and defining aviewable area and comprising a request handler configured to accept aquery of said at least one geospatial database to generate selectedgeospatially-tagged metadata, and an overlay generator configured tooverlay the selected geospatially-tagged metadata onto the viewable areaand relating to a geolocation outside the viewable area.
 12. The videoprocessing system of claim 11 wherein the selected geospatially-taggedmetadata comprises at least one of geospatially referenced featureannotations, geospatially referenced video source locations, andgeospatially referenced points of interest.
 13. The video processingsystem of claim 11 wherein said marker handler generates at least oneindicator for the selected geospatially-tagged metadata.
 14. The videoprocessing system of claim 13 wherein the at least one indicatorcomprises at least one of a range and bearing.
 15. A video processor foruse with a display and at least one geospatial database, the videoprocessor comprising: a marker handler configured to determine selectedgeospatially-tagged metadata; and an overlay generator configured todisplay a georeferenced video feed on the display and defining aviewable area, and overlay the selected geospatially-tagged metadataonto the viewable area and relating to a geolocation outside theviewable area.
 16. The video processor of claim 15 wherein the selectedgeospatially-tagged metadata comprises at least one of geospatiallyreferenced feature annotations, geospatially referenced video sourcelocations, and geospatially referenced points of interest.
 17. The videoprocessor of claim 15 wherein said video processor comprises an overlaygenerator configured to overlay the selected geospatially-taggedmetadata onto the viewable area.
 18. A video processing methodcomprising: displaying a georeferenced video feed on a display anddefining a viewable area; and overlaying selected geospatially-taggedmetadata onto the viewable area and relating to a geolocation outsidethe viewable area.
 19. The method of claim 18 wherein the selectedgeospatially-tagged metadata comprises at least one of geospatiallyreferenced feature annotations, geospatially referenced video sourcelocations, and geospatially referenced points of interest.
 20. Themethod of claim 18 further comprising generating at least one indicatorfor the selected geospatially-tagged metadata comprising at least one ofa range indicator and a bearing indicator.